Goto

Collaborating Authors

 generalization accuracy




Multi-Scenario Highway Lane-Change Intention Prediction: A Physics-Informed AI Framework for Three-Class Classification

Shi, Jiazhao, Lin, Yichen, Hua, Yiheng, Wang, Ziyu, Zhang, Zijian, Zheng, Wenjia, Song, Yun, Lu, Kuan, Lu, Shoufeng

arXiv.org Artificial Intelligence

Lane-change maneuvers are a leading cause of highway accidents, underscoring the need for accurate intention prediction to improve the safety and decision-making of autonomous driving systems. While prior studies using machine learning and deep learning methods (e.g., SVM, CNN, LSTM, Transformers) have shown promise, most approaches remain limited by binary classification, lack of scenario diversity, and degraded performance under longer prediction horizons. In this study, we propose a physics-informed AI framework that explicitly integrates vehicle kinematics, interaction feasibility, and traffic-safety metrics (e.g., distance headway, time headway, time-to-collision, closing gap time) into the learning process. lane-change prediction is formulated as a three-class problem that distinguishes left change, right change, and no change, and is evaluated across both straight highway segments (highD) and complex ramp scenarios (exiD). By integrating vehicle kinematics with interaction features, our machine learning models, particularly LightGBM, achieve state-of-the-art accuracy and strong generalization. Results show up to 99.8% accuracy and 93.6% macro F1 on highD, and 96.1% accuracy and 88.7% macro F1 on exiD at a 1-second horizon, outperforming a two-layer stacked LSTM baseline. These findings demonstrate the practical advantages of a physics-informed and feature-rich machine learning framework for real-time lane-change intention prediction in autonomous driving systems.


Deep Neural Nets with Interpolating Function as Output Activation

Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher

Neural Information Processing Systems

And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning.


c467978aaae44a0e8054e174bc0da4bb-Supplemental.pdf

Neural Information Processing Systems

In Appendix A.1, we describe the attributes and the comparisons in the VQA-MNIST datasets. A.1 Attributes and Comparisons Visual attributes. This corresponds to a total of 21 attribute instances. The sub-tasks for comparison of spatial relations are in Table App.2. Sub-tasks for attribute comparison between pairs of separated objects.



Boosting Generalization Performance in Model-Heterogeneous Federated Learning Using Variational Transposed Convolution

Niu, Ziru, Dong, Hai, Qin, A. K.

arXiv.org Artificial Intelligence

Federated learning (FL) is a pioneering machine learning paradigm that enables distributed clients to process local data effectively while ensuring data privacy. However, the efficacy of FL is usually impeded by the data heterogeneity among clients, resulting in local models with low generalization performance. To address this problem, traditional model-homogeneous approaches mainly involve debiasing the local training procedures with regularization or dynamically adjusting client weights in aggregation. Nonetheless, these approaches become incompatible for scenarios where clients exhibit heterogeneous model architectures. In this paper, we propose a model-heterogeneous FL framework that can improve clients' generalization performance over unseen data without model aggregation. Instead of model parameters, clients exchange the feature distributions with the server, including the mean and the covariance. Accordingly, clients train a variational transposed convolutional (VTC) neural network with Gaussian latent variables sampled from the feature distributions, and use the VTC model to generate synthetic data. By fine-tuning local models with the synthetic data, clients significantly increase their generalization performance. Experimental results show that our approach obtains higher generalization accuracy than existing model-heterogeneous FL frameworks, as well as lower communication costs and memory consumption


The effects of Hessian eigenvalue spectral density type on the applicability of Hessian analysis to generalization capability assessment of neural networks

Gabdullin, Nikita

arXiv.org Artificial Intelligence

Hessians of neural network (NN) contain essential information about the curvature of NN loss landscapes which can be used to estimate NN generalization capabilities. We have previously proposed generalization criteria that rely on the observation that Hessian eigenvalue spectral density (HESD) behaves similarly for a wide class of NNs. This paper further studies their applicability by investigating factors that can result in different types of HESD. We conduct a wide range of experiments showing that HESD mainly has positive eigenvalues (MP-HESD) for NN training and fine-tuning with various optimizers on different datasets with different preprocessing and augmentation procedures. We also show that mainly negative HESD (MN-HESD) is a consequence of external gradient manipulation, indicating that the previously proposed Hessian analysis methodology cannot be applied in such cases. We also propose criteria and corresponding conditions to determine HESD type and estimate NN generalization potential. These HESD types and previously proposed generalization criteria are combined into a unified HESD analysis methodology. Finally, we discuss how HESD changes during training, and show the occurrence of quasi-singular (QS) HESD and its influence on the proposed methodology and on the conventional assumptions about the relation between Hessian eigenvalues and NN loss landscape curvature.


Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization

Qin, Tian, Saphra, Naomi, Alvarez-Melis, David

arXiv.org Artificial Intelligence

Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn hierarchical syntactic representations to correctly apply grammatical rules out-of-distribution (OOD). In this work, we use case studies of English grammar to explore how complex, diverse training data drives models to generalize OOD. We construct a framework that unifies our understanding of random variation with training dynamics, rule selection with memorization, and data diversity with complexity. We show that these factors are nuanced, and that intermediate levels of diversity and complexity lead to inconsistent behavior across random seeds and to unstable training dynamics. Our findings emphasize the critical role of training data in shaping generalization patterns and illuminate how competing model strategies lead to inconsistent generalization outcomes across random seeds.


Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis

Gabdullin, Nikita

arXiv.org Artificial Intelligence

This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA). LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian. Different approaches to NN loss landscape plotting are discussed with particular focus on normalization techniques showing that conventional methods cannot always ensure correct visualization when batch normalization layers are present in NN architecture. The use of Hessian axes is shown to be able to mitigate this effect, and methods for choosing Hessian axes are proposed. In addition, spectra of Hessian eigendecomposition are studied and it is shown that typical spectra exist for a wide range of NNs. This allows to propose quantitative criteria for Hessian analysis that can be applied to evaluate NN performance and assess its generalization capabilities. Generalization experiments are conducted using ImageNet-1K pre-trained models along with several models trained as part of this study. The experiment include training models on one dataset and testing on another one to maximize experiment similarity to model performance in the Wild. It is shown that when datasets change, the changes in criteria correlate with the changes in accuracy, making the proposed criteria a computationally efficient estimate of generalization ability, which is especially useful for extremely large datasets.